Zipf's law arises naturally in structured, high-dimensional data

نویسندگان

  • Laurence Aitchison
  • Nicola Corradi
  • Peter E. Latham
چکیده

Zipf’s law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many different domains. Although there are models that explain Zipf’s law in each of them, there is not yet a general mechanism that covers all, or even most, domains. Here we propose such a mechanism. It relies on the observation that real world data is often generated from some underlying, often low dimensional, causes — low dimensional latent variables. Those latent variables mix together multiple models that do not obey Zipf’s law, giving a model that does obey Zipf’s law. In particular, we show that when observations are high dimensional, latent variable models lead to Zipf’s law under very mild conditions — conditions that are typically satisfied for real world data. We identify an underlying latent variable for language, neural data, and amino acid sequences, and we speculate that yet to be uncovered latent variables are responsible for Zipf’s law in other domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zipf’s Law Arises Naturally When There Are Underlying, Unobserved Variables

Zipf's law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many domains. While there are models that explain Zipf's law in each of them, those explanations are typically domain specific. Recently, methods from statistical physics were used to show that a fairly broad class of models does provide a general explanation of Zipf's law...

متن کامل

Zipf's law and criticality in multivariate data without fine-tuning.

The joint probability distribution of states of many degrees of freedom in biological systems, such as firing patterns in neural networks or antibody sequence compositions, often follows Zipf's law, where a power law is observed on a rank-frequency plot. This behavior has been shown to imply that these systems reside near a unique critical point where the extensive parts of the entropy and ener...

متن کامل

Zipf’s Law: A Microfoundation

Existing explanations of Zipf’s law (Pareto exponent approximately equal to 1) in size distributions require strong assumptions on growth rates or the minimum size. I show that Zipf’s law naturally arises in general equilibrium when individual units solve a homogeneous problem (e.g., homothetic preferences, constant-returns-to-scale technology), the units appear and disappear at a small constan...

متن کامل

Evolution of Scaling Emergence in Large-Scale Spatial Epidemic Spreading

BACKGROUND Zipf's law and Heaps' law are two representatives of the scaling concepts, which play a significant role in the study of complexity science. The coexistence of the Zipf's law and the Heaps' law motivates different understandings on the dependence between these two scalings, which has still hardly been clarified. METHODOLOGY/PRINCIPAL FINDINGS In this article, we observe an evolutio...

متن کامل

Zipf's Law everywhere

At the 100th anniversary of the birth of George Kingsley Zipf, one striking fact about the statistical regularity that bears his name, Zipf's law, is that it seems to appear everywhere. We may ask these questions related to the ubiquity of Zipf's law: Is there a rigorous test in fitting real data to Zipf's law? In how many forms does Zipf's law appear? In which fields are the data sets claiming...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014